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ABSTRACT 

Several Sox-Oct transcription factor (TF) combin- 
ations have been shown to cooperate on diverse 
enhancers to determine cell fates. Here, we de- 
veloped a method to quantify biochemically the 
Sox-Oct cooperation and assessed the pairing of 
the high-mobility group (HMG) domains of 11 Sox 
TFs with Oct4 on a series of composite DNA 
elements. This way, we clustered Sox proteins ac- 
cording to their dimerization preferences illustrating 
that Sox HMG domains evolved different propensities 
to cooperate with Oct4. Sox2, Sox14, Sox21 and 
Sox15 strongly cooperate on the canonical element 
but compete with Oct4 on a recently discovered com- 
pressed element. Sry also cooperates on the canon- 
ical element but binds additively to the compressed 
element. In contrast, Sox1 7 and Sox4 cooperate more 
strongly on the compressed than on the canonical 
element. Sox5 and Sox18 show some cooperation 
on both elements, whereas Sox8 and Sox9 compete 
on both elements. Testing rationally mutated Sox 
proteins combined with structural modeling high- 
lights critical amino acids for differential Sox-Oct4 
partnerships and demonstrates that the 
cooperativity correlates with the efficiency in 
producing induced pluripotent stem cells. Our 
results suggest selective Sox-Oct partnerships in 
genome regulation and provide a toolset to study 
protein cooperation on DNA. 

INTRODUCTION 

How regulatory information is genetically encoded is an 
overarching yet unresolved question in genome biology. 



This information is scanned and interpreted by 
sequence-specific transcription factor (TF) proteins. 
However, the biochemical basis for the selective recruit- 
ment of TFs to genomic enhancers that govern spatial and 
temporal gene expression remains elusive. Multiple studies 
have shown that TFs often bind to short and degenerate 
DNA-binding sites that have been discovered computa- 
tionally in huge numbers throughout the genome (1-3). 
Yet, only 1-5% of these binding sites are actually 
occupied by the corresponding TF. How do TFs discrim- 
inate between functional and nonfunctional binding sites? 
It has been shown that TFs have a propensity to cluster 
and are more likely to target genomic regions that are 
co-bound by other factors (4,5). Potentially, enhancers 
of co-expressed genes could share their own distinctive 
'fingerprint' or grammar of DNA motifs that recruit par- 
ticular TF combinations. To predict gene expression 
patterns from DNA sequence and TF concentration 
alone, this motif grammar needs to be decoded. It is 
possible that enhancers of co-expressed genes are only 
loosely defined with an unconstrained arrangement of 
binding motifs over several 100 bps not necessitating the 
direct physical interactions of TFs (4,6). In contrast, the 
motif grammar may include binding sites with constrained 
spacing between them whose recognition is tied to specific 
protein-interaction surfaces of individual TF proteins. 
These protein interactions underlie their developmental 
specificities and selectively target TFs to genomic enhan- 
cers. However, while TF heterodimerization predominates 
among paralogous groups of TFs such as nuclear recep- 
tors (7), helix-loop-helix (8) and leucine zipper families 
(9), examples for the selective dimerization of structurally 
unrelated TFs are sparse. Nevertheless, several studies 
have highlighted the importance of a direct cooperation 
between unrelated TF pairs (10-13). Most prominently, 
the Sox and Oct families of TFs have been shown to co- 
operate to execute key developmental programs (14,15). 
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On their own, all 20 mammalian Sox proteins bind to a 
CTTTGT-like sequence (2,16,17) while most Oct factors 
recognize an octamer related to a ATGCAAAT sequence 
(18). By combining these sequences, composite motifs can 
be constructed with different motif orientation and 
spacing (19). Several such composite motifs have been 
found to be functional targets for the synergistic regula- 
tion by Sox and Oct proteins. For example, (i) the 
Sox2-Oct4 pair drives stem-cell pluripotency genes on 
either a 0 or a 3 bp-spaced motif element (20-23); (ii) 
Soxl7-Oct4 cooperate during endodermal differentiation 
(24) presumably on a compressed motif (19) and (iii) 
Sox2-Brn2 regulates brain development on a sox-oct 
motif with a 6-bp spacer (13). Notably, when the coopera- 
tive binding of Sox2 and Oct4 to DNA is perturbed by 
rational mutagenesis its ability to induce pluripotency is 
lost (19). Conversely, although wild- type Sox 17 cannot 
induce pluripotency, a mutated version of Sox 17 increases 
cooperative binding with Oct4 on pluripotency gene en- 
hancers and thus has the potential to induce pluripotency. 
Such results suggested that there might be a Sox-Oct 
partner code that underlies cell fate decisions (14,15). To 
further investigate whether members of the Sox and Oct 
families evolved features to cooperatively target specific 
enhancer elements a global assessment of the Sox-Oct 
pairing profile is highly desirable. To this end, we have 
developed a method to measure heterodimer cooperativity 
factors revealing the mode TF heterodimerization on com- 
posite DNA elements. We used this method to study the 
heterodimerization propensities of representative 
members of all seven major Sox families with Oct4 on a 
range of composite DNA elements. As a result, we found 
that Sox families exhibit markedly different propensities 
to associate with Oct4 on distinctly configured binding 
motifs. By measuring cooperativity factors of rationally 
mutated Sox proteins, we found that the re-engineered 
Soxl7EK behaves like an enhanced Sox2. This likely 
underlies its improved properties in producing induced 
pluripotent stem cells in lieu of the native pluripotency 
factor Sox2 (19). Together, we demonstrate that 
cooperativity measurements are critical to understand 
TF function and the c/s-regulatory logic of developmental 
enhancers. 



MATERIALS AND METHODS 

Cloning, protein expression and purification 

The POU domains of mouse Oct4 and high-mobility group 
(HMG) domains of mouse Sry, Sox2, 14, 21,4, 5, 8, 9, 17, 18 
and 1 5 were BP cloned from their respective Imagene clones 
(Oct4: IRAKp961K04111Q; Sry: IRAMp995I2211Q; 
Sox2: RPCIB731A06406Q; Soxl4: IRAKp961K05125Q; 
Sox21: IRAKp961C14126Q; Sox4: IRAVp968F02163D; 
Sox5: IRAMp995N0310Q; Sox8: IRAVp968H01144D; 
Sox9: IRAVp968B0369D; Soxl7: OCACo5052D058; 
Soxl8: IRAVp968E0317D; Soxl5: IRCKp5014B242Q) 
into a pENTRY vector, pDONR221 (see Supplementary 
Table SIC for primer sequences). The resulting 
pENTR-constructs were first verified by sequencing 
and recombined into either pETG20A or pHisMBP 



expression plasmids using the GATEWAY™ technolosv 
(Invitrogen). Constructs were transformed into Escherichia 
coli BL21(DE3) cells, grown in 1 x Terrific Bertani broth 
supplemented with 0.1% glucose and 100 ug/ml ampicillin 
until OD600 nm M).6-0.8 before inducing with 0.5 mM 
isopropyl-(3-thiogalactoside at 18°C for ~18h. Fusion 
proteins were purified using previously published protocol 
(19,25,26). In short, fusion proteins were purified using an 
immobilized metal affinity chromatography step, tag 
cleavage using the TEV protease followed by ion-exchange 
chromatography and gel filtration. 



Electrophoretic mobility shift assays 

Electrophoretic mobility shift assays (EMS As) were 
carried out using forward strand S'CyS-labeled-dsDNA 
(Sigma Proligo, see Figures 2 A and 3 A). DNA probes 
were prepared by mixing equimolar amounts of comple- 
mentary strands in 1 x annealing buffer (20 mM Tris-HCl, 
pH 8.0; 50 mM MgCl 2 ; 50 mM KC1), heated to 95°C for 
5min and subsequently with l°C/min ramping down to 
4°C in a PCR block. Typical binding reactions contain 
100 nM dsDNA with varying concentrations of both Sox 
and Oct4 proteins in a 1 x EMSA buffer [20 mM Tris- 
HCl pH 8.0, O.lmg/ml bovine serum albumin (Biorad), 
50|iM ZnCl 2 , 100 mM KC1, 10% (v/v) glycerol, 0.1% 
(v/v) Igepal CA630 and 2mM (3-mercaptoethanol] and 
were incubated for 1 hr at 4°C in the dark to reach 
binding equilibrium. Reactions were loaded into a 
pre-run 12% native 1 x Tris-glycine (25 mM Tris pH 
8.3; 192mM glycine) polyacrylamide gel, and DNA 
complexes were separated at 4°C for 30min at 200 V. 
The bands were detected using a Typhoon 9140 
Phosphorlmager (Amersham Biosciences) and quantified 
using the ImageQuant TL software (Amersham 
Biosciences). 



Cooperativity factor measurement 

As an extension of our homodimer model described pre- 
viously (27), we defined four possible microstates for a 
heterodimer-binding model. The participating species 
are defined as D for DNA, PI for protein 1 and P2 for 
protein 2. The equilibrium dissociation constant of each 
individual protein can be represented as in Equation 1 
where [D] is the concentration of free DNA, [PI] and 
[P2] concentrations of free proteins and [DPI] and [DP2] 
solitary protein-DNA complexes. 



Kd\ = 



[DPI] 



or K d2 = 



[D][P2] 
[DPI] 



(1) 



If PI and P2 are mixed with DNA in the same tube, the 
fourth state representing a ternary complex becomes 
feasible. Dissociation constants of secondary-binding 
events are described by Equation 2, where [DP1P2] is 
the equilibrium concentration of heterodimeric protein- 
DNA complexes. 



Kdn = 



[DPl][P2] 
[DPIP2] 



and K d2 \ = 



[DP2][Pl] 
[DPIP2] 



(2) 
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With foifufz and f 3 defined as the fractional concentra- 
tions of the free DNA, monomer-DNA complexes 1 and 2 
and the heterodimer-DNA complex, respectively 
(/o+/i +fi+h = 1)> the heterodimer cooperativity factor 
co can be straightforwardly calculated from the experimen- 
tally determined fractional concentrations fafufz and f 3 
as follows: 

= K dl = K d2 = [g][gPLP2] = /q/3 
" ^21 ^12 [DP1][DP2] /1/2 1 ' 

As defined here, co > 1 implies positive cooperativity; 
co = 1 no cooperativity; co < 1 negative cooperativity. To 
reduce errors when calculating co, we only included meas- 
urements, where each of the four fractional concentrations 
were at least 5%. Cooperativity factor heatmaps and 
graphs for the Sox-Oct4-DNA combinations were 
plotted using R (http://www.r-project.org/) with the 
Gplot package. 

Structural modeling and analysis 

Homology models for Sox HMG or the Oct4 POU 
domain proteins were generated using I-TASSER with 
the Sox 17 HMG (pdb-id 3F27) and the Octl POU 
(pdb-id 1GT0) as templates (http://zhanglab.ccmb.med. 
umich.edu/I-TASSER/) (28). Sox HMGs were subse- 
quently superimposed onto the Sox2 HMG/Octl POU 
complex bound to a canonical element (pdb-id 104X). 
Superpositions, visual inspections and figure generation 
were carried out using PyMol. 



RESULTS AND DISCUSSION 

Sox proteins exhibit diverse protein interaction surfaces 

The 80 amino acid HMG domain of Sox proteins is highly 
conserved for all paralogs (Figure 1 A). In accordance with 
similar DNA sequence preferences for all Sox proteins (2), 
amino acids that contact DNA bases are nearly invariant 
for all Sox proteins. However, protein contact interfaces 
as defined in structural studies on Sox2 and Octl show 
some disparity (30) (highlighted as blue empty circles). As 
an extension of earlier work on Sox2 and Sox 17, we 
generated homology models for all Sox families and in- 
spected the electrostatic charge distribution on the van der 
Waals' surface (Figure IB). The protein surface of Sox 
proteins facing Oct4 when bound to canonical sox-oct 
motifs show pronounced differences distinguishing Sox 
families. The SoxC, E and F groups contain an acidic 
patch at this interface, the SoxB and SoxG groups are 
highly basic and the SoxD group is largely neutral. We 
have recently shown that residue 57 (numbering according 
to HMG conventions), which is causing the disparate elec- 
trostatic pattern of the Sox families, is critical for the ef- 
fective dimerization of Sox2 with Oct4 on pluripotency 
enhancers (19). To understand how these structural differ- 
ences affect Sox-Oct partnerships, we developed a quanti- 
tative method to study TF cooperation and analysed the 
interaction of 1 1 Sox proteins with Oct4. 



A method for the determination of cooperativity factors 

We first cloned HMG domains of mouse Sox proteins and 
screened for soluble protein expression in a 96-well 
format. Representative members of most Sox families 
were then purified to at least 95% purity. Next, we 
performed EMSAs to quantify the fractional binding of 
Sox proteins and Oct4 on DNA. At equilibrium condi- 
tions, the mixing of the two proteins with DNA results 
in macromolecular complexes corresponding to four 
microstates per EMSA lane: (1) free DNA; (2) 
Sox-bound DNA; (3) Oct4-bound DNA and (4) 
Sox-Oct4-DNA ternary complex (Figure 1C). The abun- 
dance of each microstate at equilibrium is directly propor- 
tional to its Boltzmann weight, which in turn is a function 
of the protein concentrations, the equilibrium dissociation 
constants and the cooperativity factor, co (Figure ID, see 
'Materials and Methods' section). Substituting the frac- 
tional contribution of each microstate from equilibrium 
experiments into our heterodimerization cooperativity 
model allowed us to quantify the cooperativity factor 
(Figure ID). The cooperativity factor is essentially the 
fold change in the equilibrium binding constants when a 
protein co-binds, relative to the equilibrium constant for 
solitary binding. Values greater than 1 represent positive 
cooperativity, where both proteins mutually lower their 
free energies of binding. That is, complex formation is 
favoured. Negative cooperativity (co< 1) represents a 
competitive-binding mode, where the protein has a pref- 
erence for binding to unbound DNA rather than forming 
a ternary complex. Finally, values of about 1 correspond 
to additive binding with proteins having no specific pref- 
erence to binding either DNA that was already bound by 
another protein or free DNA. 

Sox proteins exhibit a unique dimerization preference 
with Oct4 on variant DNA configurations 

We measured cooperativity factors in multiple replicates 
for 11 Sox proteins with Oct4 on nine differently 
configured composite DNA motifs including the 'canon- 
ical' Sox2-Oct4 site found in many embryonic stem cell 
enhancers (23,31), the plus3 site found in the Fgf4 
enhancer (32) and the newly discovered 'compressed' 
element (19). In particular, on the canonical and com- 
pressed elements, we observed differences in the 
cooperativity pattern for the Sox proteins, whereas all 
Sox proteins tested cooperated with Oct4 on the plus3 
element (Figure 2B, C, E and Supplementary Figure SI). 
We combined the whole dataset of log2 transformed 
cooperativity factors and created a heat map using 
the hierarchical clustering method implemented in the 
heatmap.2 R package (Figure 2C). The clustering 
approach revealed that the Sox proteins can be 
categorized into five separate groups highlighting their 
cooperativity patterns (Figure 2C). Similarly, DNA 
motif configurations cluster into five groups. Cluster II 
contains only the plus3 element displaying cooperative 
recruitment of all Sox-Oct4 pairs. Widely spaced 
elements (plus 2-plus 10; cluster III) exhibit an essen- 
tially additive-binding mode for all proteins under 
study. The plusl element, however, shows a strongly 
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Figure 1. (A) Alignment of amino acid sequence of all mouse Sox-high-mobility group (HMG) domains shaded with BOXSHADE. The Sox 
subfamilies are indicated to the right. The numbering corresponds to the HMG convention (29). a-Helices are marked with a red bar. The 
Phe-Met wedge is indicated with an orange bar below the alignment. DNA interacting residues are marked by black empty circles while Sox-Oct 
interacting residues are marked by blue empty circles. Highly conserved and similar sequences are shaded in black or gray. (B) A phylogenetic tree 
calculated using PROML (http://caps.ncbs.res.in/iws/proml.html). This simplified tree largely corresponds to the more exhaustive phylogenetic 
analysis of Sox factors. Sox subgroups (29) and the amino acids found at position 57 of the HMG domains are indicated in single letter codes. 
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competitive-binding mode for all Sox-Oct4. The canonical 
element and the compressed element (clusters I and IV) 
exhibit a strong disparity with regard to the Sox-Oct pairs 
they preferentially recruit and determine the clustering of 
Sox proteins. Sox8 and Sox9 (cluster B) are not capable of 
cooperating with Oct4 on either the compressed or the 
canonical element. By contrast, Sox5 and Sox 18 (cluster 
E) bivalently cooperate on both elements. Clusters A and 
D, however, have inverse cooperativity profiles on those 
elements. While Sox2, Soxl5, Soxl4 and Sox21 (cluster A) 
strongly cooperate on the canonical element, they are not 
capable of co-binding with Oct4 on the compressed 
element. Similarly, Sry cooperates on the canonical 
element but retains an additive-binding mode on the com- 
pressed element. By contrast, Sox4 and Soxl7 (cluster D) 
only weakly cooperate on the canonical element 
but strongly cooperate on the compressed element. 
Overall, while the HMG domains of the Sox proteins 
investigated here bind highly similar DNA sequences in 
isolation, they markedly differ in their potential to cooper- 
ate with Oct4. 

Cooperativity patterns of rationally mutated Sox proteins 

We noticed that the cooperativity-based classification of 
Sox proteins shows some relationship with their evolution- 
ary classification (Figure IB). Residue 57, which was pre- 
dicted to affect the cooperativity of Sox2 and Sox 17 with 
Oct4, provides a partial mechanistic explanation for this 
result. A lysine (Lys, K) residue at position 57 appears to 
favour cooperativity on canonical and plus3 configuration 
but is not compatible with binding to the compressed 
motif. We have previously shown that residue 57 is a 
critical determinant of the developmental function of 
Sox proteins (19). When this residue is swapped between 
Sox2 and Sox 17, that is the K is replaced by a glutamate 
(Glu, E) and vice versa, their biological functions are 
interchanged and Soxl7EK turns into an inducer of 
pluripotency and Sox2KE into a trigger of endodermal 
differentiation. To quantify the effect these mutations 
have on Oct4 cooperation, we compared the cooperativity 
of the mutant HMG domains with their wild-type coun- 
terparts. For these experiments, we used an element 
derived from the enhancer of the Nanog gene that 
behaves similarly as the idealized sox-oct element in 
cooperativity measurements (21). We found that the 
cooperativity of the Soxl7EK protein with Oct4 is 
roughly 30 times stronger than that of wild-type Soxl7 



and even three times stronger than of wild-type Sox2 
(Figure 3B and C). 

Sox5 contains an alanine (Ala, A) at position 57 and 
strongly cooperates on both the canonical and compressed 
motifs. Given the pronounced effect of residue 57 on 
Sox-Oct cooperativity, we asked whether the presence of 
an A residue at position 57 in Sox5 might perhaps explain 
its ability to bind Oct4 cooperatively on both the canon- 
ical and compressed motifs. Indeed, we found that the 
Soxl7EA mutation raises the cooperativity factor of 
Sox 17 on the canonical element 10 times more and 
brings it up on par with Sox2 and Sox5 in binding the 
canonical sequence (Figure 3B and C). 

Next, we asked whether these amino acid swap muta- 
tions also interchange the dimerization propensities on the 
compressed motif. As expected, the Soxl7EK mutation 
causes a 30-fold drop of cooperation compared to 
wild-type Soxl7 and now behaves like wild-type Sox2. 
However, Sox2KE cooperates only marginally better 
than wild-type Sox2 on this element indicating that 
further modifications in Sox2 are required to engineer a 
Soxl7-like dimerization propensity on the compressed 
element. Further, introducing the Sox5-like alanine into 
Sox 17 to generate the Soxl7EA protein results in a 
20-fold drop in the cooperativity although Sox5 cooper- 
ates strongly. We noted that Sox5 contains a glutamine 
(Gin, Q) at position 56, which is unique for the SoxD 
group (Figure 1A). All other Sox proteins contain an 
alanine at this position. It is conceivable that Gln56 
impacts the cooperation of Sox5 on the compressed 
element by compensating for Glu57. The lack of both, 
Gln56 and Glu57, could explain why Soxl7EA and 
Sox2KA cannot cooperate on the compressed element. 

We were intrigued by the observation that Soxl7EK 
cooperates more strongly with Oct4 on the canonical 
element than wild-type Sox2 (Figure 3B and C). We there- 
fore decided to further explore this by allowing two dif- 
ferent Sox proteins to compete for co-binding with Oct4 in 
the same reaction tube (Figure 3D). Consistently, 
Sox2-Oct4 complexes predominated when mixed with 
Sox 17, whereas the situation was inversed in the 
presence of Soxl7EK (Figure 3D). Interestingly, these 
results correlated with our earlier observation that 
Soxl7EK produces induced pluripotent stem cells more 
efficiently than Sox2 in reprogramming cocktails (19). 
The enhanced ability of Soxl7EK to cooperate with 
Oct4 on pluripotency enhancers may thus be the basis 
for this observation. 



Figure 1. Continued 

Electrostatic surface maps of representing Sox members were calculated as described (26). Positively and negatively charged regions were represented 
in red and blue patches, respectively. Homology models for Sox HMGs were generated using I-TASSER (28) and surface patches that differ for Sox 
groups are boxed. (C) Illustration of how the microstates of the DNA complexes were quantified using the ImageQuant TL software. The cy5-labeled 
dsDNA migrated differently on native gel depending on how the proteins and DNA associate. Thus, the fractional contribution of the microstates of 
the free DNA (f 0 ), Sox-DNA (/i), Oct4-DNA (f 2 ) and ternary complex (f 3 ) can be quantified. (D) Schematic diagram highlighting the approach to 
calculate the cooperativity of TF pairs on composite DNA elements. Boltzmann weights of the respective complexes are denoted as b_D, b_DPl, 
b_DP2 and b_DPlP2 and scaled so that the b_D = 1. [PI] and [P2] are the concentrations of the free proteins. The cooperativity factor omega does 
not depend on the concentration of the reactants but solely on the relative ratios of the four microstates represented by their fractional contributions 
measured in (C) (see main text and alternate derivation of the equation in the 'Materials and Methods' section). 
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Figure 2. (A) Sequences of the idealized composite Sox-Oct-labeled probes used. The Sox-binding sites are indicated in orange while the Oct-binding 
sites are indicated in blue; (B) Bar plots showing cumulative mean cooperativity factors for 11 Sox HMG domains for elements shown in (A). Raw 
values and individual bar plots per element are shown in Supplementary Table SI and Figure SI. To derive reliable omega values and to minimize 
errors in band quantification, the concentration of Sox HMG and the Oct4 POU was adjusted, such that the fractional contribution of each of the 
four microstates was at least 5%. If such conditions could not be established, that is, for maximally competitive binding excluding ternary complexes 
as seen on the plusl element for most Sox HMGs or Sox2-Oct4 pairing on the compressed element, omega values were set to 0.01. Constitutive 
cooperativity was not observed in this study. (C) Heat map of cooperativity factors representing the different Sox-Oct4 dimers on the various DNA 
motifs. Log2-transformed mean cooperativity factors are expressed in a three-color gradient: red (competitive), white (additive binding) and blue 
(positive cooperativity). The matrix was hierarchically clustered using the heatmap. 2 function in R with default parameters. Different categorizations 
were labeled as Clusters A-E and I-V. Each cooperativity factor was derived from at least 3 and maximally 30 replicates (see Supplementary Table 
SI). (D) Summary of the differential assembly dataset grouping Sox HMG domains exhibiting similar Oct4 cooperativity profiles. Candidate amino 
acids that likely explain the disparate Oct4 interactions at positions 57 and 64 are shown. (E) Differential assemblies of different Sox HMG members 
(50 nM) with the Oct4 POU protein (150nM) were performed on compressed (left), canonical (center) and plus3 (right) element DNA. The cartoon 
to the left symbolizes free DNA (black line), Sox (blue circles) and Oct (orange squares). 



Structural determinants for Sox -Oct cooperation 

Our findings show that residue 57 is critically important 
for the discrimination between the canonical and 
compressed motifs. To study the structural basis for the 
differential assembly of Sox HMGs, we generated 



homology models of several Sox HMG/Oct4 POU 
complexes on the canonical element (Supplementary 
Figure S2). We observed that K57 of Sox2 interacts with 
a backbone carbonyl of the POU specific domain 
(Supplementary Figure S2). When K57 is replaced by 
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Figure 3. (A) Sequences of the labeled Nanog element probes used (21). The Sox-binding sites are indicated in grey while the Oct-binding sites are 
indicated as underlined; (B) Representative EMS As of different Sox proteins with Oct4 on canonical and compressed motif. The indicate mutants 
refer to amino acid position 57 of the HMG domain. (C) Cooperativity factors for various Sox mutants compared to their wild-type counterparts 
expressed as mean ± standard deviation. (D) Competitive EMS A analysis of showing that the Soxl7EK-Oct4 complexes predominate Sox2-Oct4 
complexes, whereas the Sox2-Oct4 complex clearly outcompetes Soxl7-Oct4 (lanes 9 and 10). A N-and C-terminally extended Sox2 HMG domain 
(2L) comprising 109 amino acids (residues 33-141 of full length Sox2 protein) was used to distinguish the various complexes. The cartoon to the right 
symbolizes free DNA (black line), Sox2L (grey-filled circles), Sox 17 and Soxl7EK (grey empty circles) and Oct4 (black squares). 



E57, as in Soxl7, Sox8 and Sox9, the negatively charged 
carboxyl group of Glu likely causes unfavourable charge- 
charge repulsions, leading to a drop in cooperativity. In 
contrast, an A57, as found in Sox5 or in Soxl7EA, is 
compatible with binding. 

However, residue 57 alone cannot explain the 
cooperativity profiles of the whole Sox family and there 
must be a combination of contributing elements. For 
example, while the E57 proteins Sox 17, Sox 18 and Sox4 
cooperate strongly on the compressed element, Sox8 and 



Sox9 cannot, and Sox 18 retains some cooperativity on the 
canonical element. The structural modeling suggested 
residue 64 as an additional candidate underlying the dif- 
ferential behaviour of Sox factors in our cooperativity 
assays (Supplementary Figure S2). Residue 64 is placed 
at the interaction interface and shows a strong divergence 
within the Sox family (Figure 2D, Supplementary 
Figure S2). Importantly, a M64E mutation has been 
demonstrated to abrogate Sox2-Oct4 interaction (30). 
Likewise, the charged K64 in Sox8 and Sox9 could 
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underlie their inability to interact with Oct4 on canonical 
and compressed elements. 

While exhibiting overall similar cooperativity patterns, 
the degree of cooperativity still differs for Sox HMG 
domains with identical residues at positions 57 and 64 
(i.e. Soxl4/Sox21 cooperate ~5 times more strongly than 
Sox2 on the canonical element). We hypothesize that those 
residual difference are due to variations within the flexible 
and only poorly conserved C-terminal tail of the HMG 
domain that was shown to contribute to Sox2-Oct4 inter- 
actions (30) (Figure 1A). Experimental structures of 
Sox-HMG-Oct4 combinations on canonical and com- 
pressed elements combined with mutagenesis experiments 
are desirable to put those hypotheses to a test. 



CONCLUSION 

In this study, we have developed a biochemical system that 
enables quantitative measurements for protein 
heterodimerization on DNA and illustrated how such 
measurements can dissect protein partnerships of a 
whole protein family. This approach delivers a thermo- 
dynamic constant, the cooperativity factor, which allows 
discriminating between competitive, additive and coopera- 
tive interaction modes of TF proteins. Our results suggest 
that the protein interaction region of individual Sox HMG 
domains encodes features enabling them to target specific 
enhancers by teaming up with partner factors. By 
contrast, the actual DNA recognition interface is very 
similar and seems unable to explain the functional unique- 
ness of individual family members (2,26,30,33). A limita- 
tion of the approach presented here is its limited 
throughput. However, high throughput methods such as 
protein-binding microarrays (34) and HT-SELEX (3) have 
not yet been adapted to identify composite DNA motifs of 
heterodimers. Even if those methods can be adjusted to 
multi-component systems, the development of computa- 
tional tools that can accurately model cooperativity will 
pose a significant challenge. Thus, we expect that our 
method will complement high throughput efforts by 
validating composite motifs and by providing quantitative 
estimates of the physical cooperativity in TF-DNA 
binding. 

For Sox-Oct partnerships and probably many other TF 
pairs, direct cooperativity is likely a major determinant for 
the recruitment to functional-binding sites, and therefore a 
major determinant of cell-type-specific biological function 
(14,35). Thus, interrogating TF heterodimerization will 
allow inferring coding principles for developmental enhan- 
cers and molecular mechanisms for selective and com- 
binatorial enhancer recognition by TF proteins. We have 
previously used such insights to re-engineer the endo- 
derm differentiation factor Sox 17 into an inducer of 
pluripotency that speeds up stem-cell production (19). 
The proof-of-concept that TFs can be optimized by 
tweaking their heterodimerization and that their func- 
tion can be rationally altered has broad application in 
stem cell biology and tissue engineering for regenerative 
medicine. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Table 1A-1C, Supplementary Figures 1 
and 2. 



ACKNOWLEDGEMENTS 

The authors are grateful to Andrew Hutchins for critical 
reading of the manuscript and suggestions and to Siew 
Hua Choo for technical support. Author contributions: 
C. K.L.N, and R.J.: conception and design, collection 
and/or assembly of data, data analysis and interpretation, 
manuscript writing, final approval of manuscript; S.P.: 
derivation of cooperativity formula, data analysis and in- 
terpretation, manuscript writing; N.X.L., S.C.: collection 
of data; P.K.: financial support, administrative support, 
final approval of manuscript. 

FUNDING 

Funding for open access charge: Agency for Science, 
Technology and Research (A*STAR) Singapore. 

Conflict of interest statement. None declared. 
REFERENCES 

1. BryneJ.C, Valen,E., Tang,M.H., Marstrand,T., Winther,0., da 
PiedadeJ., Krogh,A-, Lenhard,B. and Sandelin,A- (2008) 
JASPAR, the open access database of transcription factor-binding 
profiles: new content and tools in the 2008 update. Nucleic Acids 
Res., 36, D102-D106. 

2. Badis,G., Berger,M.F., Philippakis,A.A., Talukder,S., 
Gehrke,A.R., Jaeger,S.A., Chan,E.T., Metzler,G., Vedenko,A., 
Chen,X. et al. (2009) Diversity and complexity in DNA 
recognition by transcription factors. Science, 324, 1720-1723. 

3. Jolma,A., Kivioja,T., ToivonenJ., Cheng,L., Wei,G., Enge,M., 
Taipale,M., VaquerizasJ.M., Yan,J., Sillanpaa,M.J. et al. (2010) 
Multiplexed massively parallel SELEX for characterization of 
human transcription factor binding specificities. Genome Res., 20, 
861-873. 

4. Biggin,M.D. (2011) Animal transcription networks as highly 
connected, quantitative continua. Dev. Cell, 21, 611-626. 

5. KadonagaJ.T. (2004) Regulation of RNA polymerase II 
transcription by sequence-specific DNA binding factors. Cell, 116, 
247-257. 

6. Mirny,L.A. (2011) Nucleosome-mediated cooperativity between 
transcription factors. Proc. Natl. Acad. Sci. USA, 107, 
22534-22539. 

7. Glass,C.K. (1994) Differential recognition of target genes by 
nuclear receptor monomers, dimers, and heterodimers. Endocr. 
Rev., 15, 391-407. 

8. Grove,C.A., De Masi,F., Barrasa,M.L, Newburger,D.E., 
Alkema,M.J., Bulyk,M.L. and Walhout,A.J. (2009) A 
multiparameter network reveals extensive divergence between 
C. elegans bHLH transcription factors. Cell, 138, 314-327. 

9. Hai,T.W., Liu,F., Coukos,W.J. and Green,M.R. (1989) 
Transcription factor ATF cDNA clones: an extensive family of 
leucine zipper proteins able to selectively form DNA-binding 
heterodimers. Genes Dev., 3, 2083-2090. 

10. Garvie,C.W., HagmanJ. and Wolberger,C. (2001) Structural 
studies of Ets-1/Pax5 complex formation on DNA. Mol. Cell, 8, 
1267-1276. 

11. Hollenhorst,P.C, Chandler,K.J., Poulsen,R.L., Johnson,W.E., 
Speck,N.A. and Graves, B.J. (2009) DNA specificity determinants 
associate with distinct transcription factor functions. PLoS Genet., 
5, el000778. 



Nucleic Acids Research, 2012, Vol 40, No. 11 4941 



12. Kamachi,Y., Uchikawa,M., Tanouchi,A., Sekido,R. and 
Kondoh,H- (2001) Pax6 and SOX2 form a co-DNA-binding 
partner complex that regulates initiation of lens development. 
Genes Dev., 15, 1272-1286. 

13. Tanaka,S., Kamachi,Y., Tanouchi,A., Hamada,H., Jing,N. and 
Kondoh,H. (2004) Interplay of SOX and POU factors in 
regulation of the Nestin gene in neural primordial cells. 

Mol. Cell. Biol, 24, 8834-8846. 

14. Wilson,M. and Koopman,P. (2002) Matching SOX: partner 
proteins and co-factors of the SOX family of transcriptional 
regulators. Curr. Opin. Genet. Dev., 12, 441-446. 

15. Kondoh,H. and Kamachi,Y. (2010) SOX-partner code for cell 
specification: regulatory target selection and underlying molecular 
mechanisms. Int. J. Biochem. Cell Biol., 42, 391-399. 

16. Nasrin,N., Buggs,C, Kong,X.F., CarnazzaJ., Goebl,M. and 
Alexander-Bridges,M. (1991) DNA-binding properties of the 
product of the testis-determining gene and a related protein. 
Nature, 354, 317-320. 

17. van de Wetering,M., Oosterwegel,M., van Norren,K. and 
Clevers,H. (1993) Sox-4, an Sry-like HMG box protein, is a 
transcriptional activator in lymphocytes. EMBO J., 12, 
3847-3854. 

18. Ryan,A.K. and Rosenfeld,M.G. (1997) POU domain family 
values: flexibility, partnerships, and developmental codes. 
Genes Dev., 11, 1207-1225. 

19. Jauch,R., Aksoy,L, Hutchins,A.P., Ng,C.K., Tian,X.F., ChenJ., 
Palasingam,P., Robson,P., Stanton,L.W. and Kolatkar,P.R. (2011) 
Conversion of Sox 17 into a pluripotency reprogramming factor 
by reengineering its association with Oct4 on DNA. Stem Cells, 
29, 940-951. 

20. Ambrosetti,D.C, Basilico,C. and Dailey,L. (1997) Synergistic 
activation of the fibroblast growth factor 4 enhancer by Sox2 and 
Oct-3 depends on protein-protein interactions facilitated by a 
specific spatial arrangement of factor binding sites. Mol. Cell. 
Biol, 17, 6321-6329. 

21. Rodda,D.J., ChewJ.L., Lim,L.H., Loh,Y.H., Wang,B., Ng,H.H. 
and Robson,P. (2005) Transcriptional regulation of nanog by 
OCT4 and SOX2. /. Biol. Chem., 280, 24731-24737. 

22. Nishimoto,M., Fukushima,A., Okuda,A. and Muramatsu,M. 
(1999) The gene for the embryonic stem cell coactivator UTF1 
carries a regulatory element which selectively interacts with a 
complex composed of Oct-3/4 and Sox-2. Mol. Cell Biol., 19, 
5453-5465. 

23. Boyer,L.A., Lee,T.L, Cole,M.F., Johnstone,S.E., Levine,S.S., 
ZuckerJ.P., Guenther,M.G., Kumar,R.M., Murray, H.L., 
Jenner,R.G. et al. (2005) Core transcriptional regulatory circuitry 
in human embryonic stem cells. Cell, 122, 947-956. 



24. Stefanovic,S., Abboud,N., Desilets,S., Nury,D., Cowan,C. and 
Puceat,M. (2009) Interplay of Oct4 with Sox2 and Soxl7: a 
molecular switch from stem cell pluripotency to specifying a 
cardiac fate. J. Cell Biol, 186, 665-673. 

25. Ng,C.K., Palasingam,P., Venkatachalam,R., Baburajendran,N., 
ChengJ., Jauch,R. and Kolatkar,P.R. (2008) Purification, 
crystallization and preliminary X-ray diffraction analysis of the 
HMG domain of Soxl7 in complex with DNA. Acta Crystallogr. 
Sect. F Struct. Biol. Cryst. Commun., 64, 1184-1187. 

26. Palasingam,P., Jauch,R., Ng,C.K. and Kolatkar,P.R. (2009) The 
structure of Sox 17 bound to DNA reveals a conserved bending 
topology but selective protein interaction platforms. J. Mol. Biol., 
388, 619-630. 

27. BabuRajendran,N., Palasingam,P., Narasimhan,K., Sun,W., 
Prabhakar,S., Jauch,R. and Kolatkar,P.R. (2010) Structure of 
Smadl MH1/DNA complex reveals distinctive rearrangements of 
BMP and TGF-{beta} effectors. Nucleic Acids Res., 38, 
3477-3488. 

28. Zhang, Y. (2008) I-TASSER server for protein 3D structure 
prediction. BMC Bioinformatics, 9, 40. 

29. BowlesJ., Schepers,G. and Koopman,P. (2000) Phylogeny of the 
SOX family of developmental transcription factors based on 
sequence and structural indicators. Dev. Biol., 227, 239-255. 

30. Remenyi,A., Lins,K., Nissen,L.J., Reinbold,R., Sch61er,H.R. and 
Wilmanns,M. (2003) Crystal structure of a POU/HMG/DNA 
ternary complex suggests differential assembly of Oct4 and Sox2 
on two enhancers. Genes Dev., 17, 2048-2059. 

31. Chen,X., Xu,H., Yuan,P., Fang,F., Huss,M., Vega,V.B., Wong,E., 
Orlov,Y.L., Zhang,W., Jiang,J. et al. (2008) Integration of 
external signaling pathways with the core transcriptional network 
in embryonic stem cells. Cell, 133, 1106-1117. 

32. Yuan,H., Corbi,N., Basilico,C. and Dailey,L. (1995) 
Developmental-specific activity of the FGF-4 enhancer requires 
the synergistic action of Sox2 and Oct-3. Genes Dev., 9, 
2635-2645. 

33. Jauch,R., Narasimhan,K., Ng,C.K. and Kolatkar,P.R. (2011) 
Crystal structure of the Sox4 HMG/DNA complex suggests a 
mechanism for the positional interdependence in DNA 
recognition. Biochem. J., December 19 (doi:10.1042/BJ201 11768; 
epub ahead of print). 

34. Berger,M.F., Badis,G., Gehrke,A.R., Talukder,S., 
Philippakis,A.A., Pena-Castillo,L., Alleyne,T.M., Mnaimneh,S., 
Botvinnik,O.B., Chan,E.T. et al. (2008) Variation in 
homeodomain DNA binding revealed by high-resolution analysis 
of sequence preferences. Cell, 133, 1266-1276. 

35. Kamachi,Y., Uchikawa,M. and Kondoh,H. (2000) Pairing SOX 
off: with partners in the regulation of embryonic development. 
Trends Genet., 16, 182-187. 



